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Abstract 


The  problem  of  estimating  the  parameters  of  a  non-Gaussian 
autoregressive  process  is  addressed.  Departure  of  the  driving  noise 
from  Gaussianity  is  shown  to  have  the  potential  of  improving  the 
accuracy  of  the  estimation  of  the  parameters.  Vhile  the  standard  linear 
prediction  techniques  are  computationally  efficient,  they  show  a 
substantial  loss  of  efficiency  when  applied  to  non-Gaussian  processes. 
A  maximum  likelihood  estimator  is  proposed  for  more  precise  estimation 
of  the  parameters  of  these  processes  coupled  with  a  realistic  non- 
Gaussian  model  for  the  driving  noise.  The  performance  is  compared  to 
that  of  the  linear  prediction  estimator  and  as  expected  the  maximum 
likelihood  estimator  displays  a  marked  improvement. 


I .  Introduct ion 


Estimation  of  the  parameters  of  a  time  series  model  has  been  a 
widely  addressed  problem.  Rational  transfer  function  or  time  series 
models  are  the  most  popular  of  these.  Autoregressive  (AR)  models  are 
used  more  often  than  the  moving  average  and  autoregressive  moving 
average  models  because  of  the  inherent  mathematical  simplicity. When 
the  driving  noise  is  Gaussian,  the  maximum  likelihood  estimator  (MLE) 
is  easily  found  for  the  AR  model  under  reasonable  assumptions.  [3] 

Although  the  Gaussian  model  for  the  driving  noise  is  appropriate 
for  a  wide  variety  of  problems,  in  some  applications  it  is  not.  The 
noise  encountered  in  underwater  detection  problems  is  often 
characterized  by  the  presence  of  sharp  spikes  due  to  ice  break-up  and 
offshore  drill ing. MI . C5]  Spikes  are  also  common  in  the  atmospheric 
noise  encountered  in  low  frequency  communication  systems.  [*>]  The 
Gaussian  model  is  not  flexible  enough  to  incorporate  these  high- 
amplitude  events  mainly  because  of  the  sharp  roll-off  of  the 
probability  density  function  (PDF). 

A  variety  of  alternative  models  have  been  proposed.  A  PDF  that  is 
appropriate  for  a  non~Gaussian  process  with  'heavy  tails”  is  the 
Gaussian  mixture  model 


where  Eg(0)  an<j  Ei(u)  are  Gaussian  PDF's  with  parameters  [pg,o|]  and 
[„.«*!  respectively.^!  Subscripts  B  and  I  are  used  to  denote 
background  and  interference,  respectively.  Assuming  of  >>  a|f  one  can 
allow  for  a  wide  range  of  amplitudes  and  frequencies  of  occurence  of 
spikes  by  appropriately  choosing  of  and  e,  the  mixture  parameter. 

Once  the  driving  process  is  characterized,  the  AR  model  can 
describe  a  large  set  of  correlation  patterns  with  only  a  few  number  of 
parameters.  Most  of  the  natural  noise-channels,  such  as  the  soil  and 
the  sea-water,  are  low-pass  in  nature  and  the  processes  at  their  output 
tend  to  be  well-suited  for  an  AR  model . [8] , [9] 

Physically  motivated  as  the  Gaussian  mixture  model  is,  it  does 
not  enjoy  the  mathematical  simplicity  associated  with  a  Gaussian 
model. [10]  For  example,  the  least  squares  estimator  of  the  AR 
parameters  is  no  longer  a  close  approximation  to  the  MLE  for  reasonably 
large  data  records.  This  leads  to  the  problem  of  finding  a  good  method 
to  estimate  the  parameters  that  characterize  a  mixed-Gauss ian  AE 
process.  Maximum  likelihood  estimation  is  the  most  widely  considered 
approach  for  all  mixture-density  estimation  p  oblems. ^ 

This  paper  addresses  the  problem  of  maximum  likelihood 
estimation  of  the  parameters  of  an  AR  process  driven  by  white  noise 
with  a  mixed-Gauss  ian  PDF  given  by  (1).  As  a  preliminary  step,  we 


as  some 


Hg  and  m  to  be  zero  and  o|  and  of  to  be  known  bat  e  anknown. 

The  paper  is  organized  as  follows.  Section  II  describes  the 
Cramer-Rao  bounds  for  the  parameters  of  a  mixed-Gaus s ian  AR  process  and 
attempts  to  interpret  them.  Section  III  points  oat  the  inefficiency  of 
least  squares  estimation  by  relying  on  theoretical  and  experimental 
resalts  .  Section  IV  formulates  the  MLE  for  this  problem  while  section 
V  discusses  its  performance.  Section  VI  summarizes  the  results  and 
discusses  possible  applications  of  the  MLE  approach. 


II.  Cramer-Rao  Bounds 


The  approximate  Cramer-Rao  bound  for  the 
the  location  parameter  or  mean  p,  driving 
autoregressive  filter  parameters  is  known  for  a 
order  AR  process 


unbiased  estimator  of 
noise  variance  and 
finite  variance  p-th 


x 

n 


P 

n  ~  2 
j=l 


(Xn-j-  fi)  +  un 


(2) 


un  has  a  symmetric  PDF  f  with  variance  o*  «nd  a^  are  the 
autoregressive  filter  parameters.  The  Fisher  information  matrix  for  the 
parameter  vector  0  =  [  M  flT  o*  JT,  assuming  that  {  ij,  x2,  ...  xjj  }  are 
observed,  is  given  by 
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which  may  be  shown  to  be  approximately^**^ '  W2\ 
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0^  is  a  pxl  vector  of  zeroes  and  C  is  the  pxp  covariance  mati ix  of  the 


time  series.  This  result  is  asymptotic  (true  for  large  data  records) 
because  the  contribution  of  the  first  p  samples  has  been  ignoi ed . ^*^ 
No  results  are  available  for  finite  data  records. 


It  is  of  interest  to  explicitly  determine  If  and  for  a 

Gaussian  distribution.  In  this  case 
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Substitution  of  (4)  and  (5)  into  (3)  produce  the  well  known  results  for 
the  CR  bounds  for  Gaussian  AJR  processes.  These  results  will  be 
useful  later. 


Returning  to  the  general  non-Qaussian  case,  we  are  interested  in 
determining  the  CR  bounds  for  a  and  o*  only.  |i  is  assumed  to  be  known 
and  equal  to  zero.  The  block-diagonal  nature  of  I0  makes  it  possible  to 
invert  it  by  inverting  each  block.  Therefore,  the  covariance  matrix  of 
an  unbiased  estimator  1  of  a  is  bounded  by 
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while  the  matrix  inequality  means  that  the  difference  of  the  right  and 
left  side  matrices  is  positive  semide f inite .  £n  is  the  "normalized" 
covariance  matrix  obtained  by  dividing  C  by  a1  so  that  Cn  depends  on)y 
on  the  AR  filter  parameters.  It  can  be  noted  from  (4)  that  in  the 
Gaussian  case  <J2If  is  unity  and  the  CR  bound  for  a  becomes  o2C-1(N-p) 
=  C  1/(N~p)  in  accordance  with  known  results. 


For  a  fixed  variance  a 2  the  factor  If  is  smallest  and  hence  the 
AR  f  ilter  parameters  are  most  difficult  to  estimate  in  the  Gaussian 
case.  This  can  be  shown  easily  from  the  Cauchy-Schwar tz  inequality 
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(7) 


Note  that  equality  holds  for  a  Gaussian  distribution,  as  shown  in  (4). 
For  any  other  symmetric  distribution  with  variance  a2,  1^  is  larger 

than  this  minimum  value.  In  fact  the  Gaussian  PDF  is  the  unique  PDF 
satisfying  the  equality.  To  show  this  consider  zero  mean  random 
variables  A  and  B  where  A  -  f'(u)/f(u)  and  B  =  u.  Then  if  (7)  hoids 
with  equality 

E(A*)E(B1)  =  [  E(AB)  ]2 
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This  implies  that 


E(A2 )  E(AB) 


E(B*) 


E(AB) 


where  c  is  some  constant. 


Therefore  E(A* )  =  cE(AB)  =  c2E(B2) 


E(A2 )  +  c2E(B2 )  -  2cE(AB)  =  0 


E(A2  -  2cAB  +  c2B2)  =  0 


E(A  -  cB)2  =  0 


/  ”  (A  -  cB  )2  f(u)  du  =  0. 


f(u)  and  (A  -  cB)2  are  both  non-negative  over  the  whole  range  of 
integration.  Since  we  are  only  interested  in  A  and  B  for  those  values 
of  u  which  have  non-zero  probability,  (A  -  cB)2  has  to  be 
identically  zero  over  the  whole  domain  of  f. 


Therefore  A  =  cB 


This  equality  implies 
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Integrating  both  sides  yields 


lnf(u)  =  cu*/2  +inK  (where  K  is  some  constant) 


ie,  f(u)  =  Kexp(cu*/2) . 


This  is  a  normal  disti  button  with  mean  0  and  variance  ~l/c. 


(7)  allows  us  to  compare  the  CR  bounds  for  a,  as  given  by  (6),  for 
any  symmetric  PDF  of  the  driving  noise  to  that  for  the  Gaussian  driving 
noise.  An  obvious  implication  of  (7)  is  that  it  is  possible  to  estimate 
the  AR  parameters  more  precisely  in  the  non~Gaussian  case  than  in  the 
Gaussian  case.  In  summary. 


and 


Cov  (a)  1  - —  .  — — 

a*lf  N-p 

Cov  (a)  1  ---•  C 


in  the  non-Gaussian  case 


in  the  Gaussian  case  (8) 


I 

I 

l£  has  anothei  interpretation.  It  is  the  Fisher  information  for 
the  estimator  of  the  mean  or  the  location  parameter  of  a  univariate 
PDF.  This  may  be  observed  by  setting  aj=0,  j=l,2,...p  in  the  (1,1) 
element  of  Iq  as  given  in  (3)  and  noting  that  the  factor  (N-p)  is  due 


to  accumulation  of  information  from  (N-p)  samples,  ignoring  the 
contribution  of  the  first  p  samples.  The  result  also  holds  for  tie  mean 
of  AR  p  rocesses  except  for  a  multiplying  constant.  This  means  that 
given  two  AR  processes  having  identical  power  spectral  densities  but 
different  PDF's  of  driving  noise  with  equal  variances,  it  wm  be 
easier  to  estimate  the  AR  filter  parameters  of  that  process  foi  which 
estimation  of  the  mean  is  easier.  For  the  specific  case  of  a  zero  mean 
Gaussian  mixture  model  described  in  (1) 


o2  =  (l-e)crg  +  eoj 


2  j 

°g  and  a»  are  assumed  to  be  known,  I£  can  be  computed  in  this  case  as 
follows  : 
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This  integral  can  be  evaluated  using  standard  numerical  techniques.  ■ 

a* can  be  thought  of  as  an  index  of  non-Gaussianity  where  positive 
departures  from  1  indicate  a  higher  degree  of  non-Gaussianity  as  far  as 
the  estimation  of  the  AR  filter  parameters  is  concerned.  Figure  1  plots  ! 

j 

log(oaIf)  vs.  e  for  ag  =  1  ,  cr|  =  1000  and  og  =  1  ,  aj  =  100.  For  e  =  0  j 

and  e  =  1  the  distribution  degenerates  to  a  Gaussian  one  with  variance  j 

®g  and  Oj,  respectively.  In  both  these  cases  log(o*If)  =  0.  The  plot 

i 

shows  how  much  improvement  can  be  expected  in  the  preciseness  of  the  i 

estimation  of  AR  filter  parameters  for  different  values  of  e.  For  ! 

example,  for  e  =  0.1,  Og  =  1  and  Oj  =  1600,  the  covariance  matrix  is  1 

i 

scaled  down  from  the  Gaussian  case  by  a  factor  of  about  100.  Y/e  wiir  be  j 

primarily  interested  in  values  of  e  <  0.2. 


The  CR  bound  on  variance  can  also  be  computed  numerically.  From 
(3)  it  is  evident  that 
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For  the  Gaussian  mixture  model  this  becomes 
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using  notations  of  (10)  and  (11).  The  integral  can  be  evaluated 


Figures  2  and  3  show  two  typical  non-Gauss ian  autoregressive  time 
series  along  with  the  driving  noise  time  series.  They  provide  some 
insight  as  to  why  it  should  be  possible  to  estimate  the  A£  parameters 
more  precisely  in  the  non— Gaussian  case.  The  Gaussian  mixture  model  is 
characterized  by  the  presence  of  large  spikes  in  the  driving  noise  time 
series  (  Figures  2a,  3a  ),  due  to  the  high  variance  Gaussian  component. 
The  spikes  act  as  impulses  to  the  input  of  the  AR  filter  and  result  in 
"ringing'  at  the  output  (  Figures  2b,  3b  ).  This  ringing  is  actually 
the  impulse  response  of  the  filter  which  temporarily  dominates  the  row 
variance  component  at  the  output.  It  is  probably  these  impulse 
responses  that  carry  additional  information  about  the  frrter 
parameters. 


III.  Least  Squares  Techniques 


The  usual  AR  parameter  estimation  techniques  (eg.  Autocorrelation, 
Covariance,  Forward-backward  etc.)^  do  not  enjoy  the  property  of 
asymptotic  efficiency  in  the  non-Gaussian  case.  Although  they  aie  stm 
computationally  efficient,  they  perform  much  poorer  than  the  MLE. 


Two  typical  ARM)  processes  ^  have  been  chosen  for  computer 
simulations.  The  parameters  are  given  in  Tabie  1.  o2  is  assumed  to  be 
unity  for  both  processes.  Process  I  is  broadband  while  process  II  is 
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narrowband.  50  Forward-backward  spectral  estimates  have  been  plotted 
for  these  two  processes  in  Figure  4.  The  sample  means  and  sample 
variances  of  a  and  a*  have  been  listed  and  compared  against  the  CR 
bounds  in  Table  II.  (  The  performance  of  the  MLE  will  be  described  in 
section  IV.)  The  results  are  based  on  500  experiments  with  1000  data 
samples  in  each,  <jg  and  oj  are  equal  to  1  and  100  respectively.  Also 
the  mixture  parameter  is  e  *  0.1.  The  AR  process  was  generated  by 
passing  a  white  mixed-Gauss ian  process  through  a  filter,  allowing 
sufficient  time  for  the  transients  to  decay.  The  white  process  was 
generated  by  randomly  selecting  from  two  mutually  independent  white 
Gaussian  processes  with  PDF's  Eg(u)  and  Ej(u)  (having  variances  og  and 
Oj  respectively)  on  the  basis  of  a  series  of  Bernoulli  trials  with 
probability  of  success  e.  Thus  a  random  variable  can  be  expected  to 
come  from  population  I  for  (l-s)  fraction  of  times  and  from  population 
II  for  e  fraction  of  times  so  that  the  overall  distribution  is  as  given 
in  (1). 

The  CR  bounds  for  the  Gaussian  case  (see  (8)  )  have  also  been 
listed  in  table  II.  As  expected,  the  bound  for  the  AR  parameters  in  the 
Gaussian  case  is  much  higher  than  those  in  the  non-Gaussian  case.  Jt  is 
interesting  to  note  that  the  performance  of  the  Forward-backward 
estimator  approaches  the  Gaussian  CR  bound.  This  confirms  the  well 
known  result  that  the  least  squares  estimates  are  asymptotically 
Gaussian  with  mean  equal  to  the  true  parameters  and  covariance  matrix 
equal  to  the  Gaussian  CR  bound.  However,  as  will  be  shown  in  in 


section  V,  the  MLE  attains  the  trne  CR  bound,  better  by  a  factor  of  10 
in  accordance  with  Figure  lb. 

Clearly,  the  Forward-backward  estimator,  which  is  typical  of  all 
least  squares  methods,  can  not  take  advantage  of  the  presence  of  the 
"contaminating*  process.  The  performance  of  the  covariance  method  was 
found  to  be  about  the  same.  (The  exact  results  for  the  covariance 
method  has  not  been  reported  here.)  This  confirmation  of  the  expected 
inefficiency  of  the  least  squares  techniques  ^2] ,  [14]  can8  for  ase 
of  a  more  efficient  method  which  will  be  able  to  exploit  the  reduction 
in  the  CR  bounds.  One  such  method  is  the  MLE  which  is  discussed  in  the 
next  section. 


IV.  Maxisram  Likelihood  Estimation 


In  this  section  a  Newton-Raphson  search  algorithm  is  proposed  for 
the  (p+1) -dimensional  maximization  of  the  conditional  likelihood 
function.  The  likelihood  function  is  given  by  the  joint  PDF  of  the 
observed  AR  process  which  can  be  obtained  from  the  joint  PDF  of  tbe 
driving  noise  as  follows.  An  AR(p)  time  series  is  linearly  related  to 
the  driving  noise  time  series 


=  I  a.x 

i-o  J 


(14 ) 


where  the  identification  Sq  =  1  has  been  made.  This  is  just  another  way 
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of  writing  (2),  which  makes  it  possible  to  determine  the  conditional 


likelihood  function  of  [  x^  . . .  x^  ] 1  in  terms  of  the  joint 
distribution  f  of  u  =  [  Up+2  ...  u^j  The  transformation  is 
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The  Jacobian  of  the  transformation  from  [  xp+j  xp+2  •••  XN  ^  to  u  is 
just  A.  The  determinant  of  the  Jacobian  is  unity  so  that  the  joint  PDF 
f  of  t  xp+1  xp+2  ...  1  given  ij,  x2,  ...  xp  is 


Hr  .I...  r.f  I  x  ■  x  .  i 

p+l  p-r2  N'  p  p-1 
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X.)  =  f(  u(x)  )  =  T  f(u  (x)) 
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I J 

Then 


f(Xp+l'Xp+2'  XN  I  VXp-l’  **•  Xl) 


N 

(l-s) 

1  /p  Y 

e 

1  /  1  \* 

=  T 

n=p+l 

I - r  exp 

i'Unol 

.'^Uo*jVy. 

+  rr-»  exp 

»  2  710  j 

^  Uo"Vj). 

. 
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A  Newton-Raphson  iteration  step  foi  0  -  [  aA  o  J  is 


5 


where  G  = 


9  Inf  dlnf 


dlnf  dlnf 


and  H  = 


d*2  3 

*p  da' 

d*lnf 

d  Inf 

d  Inf 

• 

da.da 

1  P 

da^da* 

• 

d*lnf 

d*lnf 

d  Inf 

da  da, 

P  1 
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P 

da  do* 
P 

d*lnf 

d*lnf 

d*  Inf 
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•  •  • 

da*  da 

P 

do** 

ff 


(16) 


(17) 


Appendix  A  gives  a  detailed  derivation  of  the  entries  of  G  and  H. 
Although  the  expansions  are  complicated,  they  exhibit  some  structure. 
This  would  allow  partly  concurrent  computation  of  the  gradient  and 
hessian  entries  in  each  step  of  Newton-Raphson  iterations.  The 
estimates  from  the  Forward-backward  method  can  serve  as  initial 
estimates. 


For  large  data  records  the  Hessian,  evaluated  at  the  true  values 
of  the  parameters,  approaches  the  negative  of  the  Fisher  information 
matrix  by  the  law  of  ia>ge  numbers.  The  Fisher  information  matrix 
is  known  to  be  positive  definite.  Therefore  the  negat ive  of  the  Hessian 
will  be  positive  definite  when  the  parametei  vector  is  close  to  its 
true  value.  Since  the  MLE  is  consistent,  the  maximum  of  the  likelihood 
function  is  expected  to  occur  close  to  the  true  value  of  the  paiamete: 
vector.  Hence  the  Hessian  will  be  negative  definite  in  the  neighborhood 
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of  the  maximum  of  the  likelihood  function,  implying  that  the  function 
is  convex  in  that  region.  This  property  of  the  Hessian  can  be  utilized 
to  avoid  the  matrix  inversion  required  by  (15)  in  the  following  way. 
Rewriting  (IS)  as 

<~5>§i+l  =  <~§>§i  +  §  (18) 

it  is  possible  to  compute  the  right  hand  side  from  6^.  Thus  (18)  is  a 
set  of  linear  equations  which  can  be  solved  by  a  Cholesky 
decomposition. 

The  first  and  second  order  derivatives  should  be  scaled  by  1/N  in 
order  to  avoid  the  possibility  of  the  terms  growing  unmanageably  large. 
The  motivation  for  scaling  down  the  terms  is  best  understood  by 
considering  the  case  of  optimization  over  a  single  parameter  assuming 
the  other  parameters  to  be  known.  The  Fisher  information,  which  is  a 
diagonal  element  of  Iq  as  given  by  (3),  increases  as  N  increases.  H  (a 
scalar  in  this  case)  also  increases  accordingly,  being  of  the  order  of 
the  Fisher  information  in  magnitude,  and  has  to  be  scaled  down. 
Multidimensional  optimization  would  be  even  more  difficult  without 
scaling. 

A  difficulty  in  obtaining  convergence  of  the  Newton-Raplison 
iteration  is  the  apparent  weak  dependence  of  the  likelihood  function 
upon  a1  (or  equivalently,  e).  A  transformation  has  been  successfully 
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used  to  circumvent  this  difficulty.  Appendix  B  addresses  this  problem 
in  detail. 


While  the  theoretical  proof  of  convexity  of  the  likelihood 
function  for  finite  data  records  is  not  available,  the  simulations  seem 
to  support  this.  The  results  of  the  simulations  are  discussed  in  the 
following  section. 


V.  Computer  Simulations  of  the  Performance  of  the  MLE 


The  least  squares  estimates  obtained  by  the  Forward-backward 
method  were  chosen  as  initial  iterates  for  the  Newton-Raphson  iteration 
required  to  find  the  MLE.  An  error  criterion  was  defined  and  the 
iteration  was  considered  to  have  converged  if  the  criterion  was 
satisfied.  A  maximum  of  100  iterations  was  allowed.  The  error  criterion 


p+i  eS  ef1 


<  R 


where  Oj  is  the  ith  iterate  for  the  jth  element  of  0.  R  can  be  chosen 
on  the  basis  of  on-line  experience  about  the  percentage  of  realizations 
that  converge  for  a  given  value  of  R.  R=10-3  seemed  to  work  well  in  the 
cases  reported  here.  A  transformation  on  e  (described  in  Appendix  B) 


■  IV  >1.  ; 


was  used  to  enhance  convergence,  as  was  mentioned  in  section  IV. 
Typically  the  iterations  converged  in  4-6  steps  and  less  than  1%  of 
thea  failed  to  converge.  The  results  to  be  described  do  not  reflect 
the  experiaents  resulting  in  failure  to  convergence. 

50  realizations  of  the  MLE  spectral  estiaator  for  the  two  typical 
ARM)  processes  described  in  section  III  have  been  plotted  in  Figure  5. 
While  it  shows  only  aoderate  improvement  upon  the  Forward-backward 
estiaates  plotted  in  Figure  4,  it  should  be  noted  that  Figure  5a  has 
less  crossovers  than  Figure  4a.  This  snggests  that  the  variability  of 
<r*  aight  be  the  aajor  factor  behind  scattering  the  plots  apart,  an 
explanation  coufiraed  by  Figures  6  and  7  which  compare  the  two  methods 
with  the  estiaate  of  the  variance  replaced  by  the  true  variance.  They 
indicate  a  considerable  iaproveaent  for  the  MLE  while  the  Forward- 
backward  estiaates  iaiprove  only  slightly. 

Table  III  coaroares  the  sample  mean  and  saiaole  variance  of  the  MLE 
estiaators  based  on  500  experiments  to  the  CR  bound.  The  number  of  data 
points  in  each  experiment  is  1000.  The  maximum  likelihood  estimators 
indeed  appear  to  be  efficient  except  for  the  estimator  of  variance 
which  disolays  a  much  higher  variability  than  the  CR  bound. 

Estimating  <r*  in  this  case  is  equivalent  to  estimating  e,  which 
represents  the  fraction  of  times  the  high  variance  Gaussian  component 
appears  in  the  observed  data.  It  is  comprehensible  that  for  small  e 
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this  estimation  will  suffer  from  the  difficulties  associated  with 
determining  the  nrobabil it ies  of  rare  events.  Very  large  data  records 
will  be  necessary  to  have  a  reasonably  good  estimate.  Anoarently, 
N»1000  was  not  large  enough  in  this  case.  However  it  should  be 
mentioned  that  the  difficulty  in  estimating  the  driving  noise  power  may 
not  be  taportant  in  some  applications.  For  example,  in  some  detection 
problems  it  is  possible  to  make  hypothesis  tests  invariant  to 

Although  convergence  was  not  a  major  problem  for  N-1000,  shorter 
data  records  did  exhibit  more  sensitivity  to  initial  conditions.  The 
reason  is  attributed  to  the  possible  non-qnadratic  nature  of  the  log 
likelihood  function  which  makes  convergence  of  the  Newton-Raphson 
iteration  more  difficult. 

VI.  Summary 

The  Cramer-Rao  bound  for  the  estimators  of  the  parameters  of 
non-Gauss  ian  AR  processes  was  observed  to  be  less  than  that  for 
Gaussian  AR  processes.  The  performance  of  the  popular  least  squares  or 
linear  prediction  techniques  however  only  achieve  the  Gaussian  CR 
bound.  The  MLE  technique,  although  computationally  intensive  compared 
to  standard  linear  prediction  approaches,  yields  more  accurate 
estimates  for  non-Gaussian  AR  processes.  Simulation  results  indicate 
that  the  MLE  is  asymptotically  efficient.  It  aoooears  to  be  a  viable 
approach  for  parameter  and  spectral  estimation  of  non-Gaussian  AR 
processes  and  may  also  be  applicable  to  linear  prediction  of  time 


series  and  detection  of  signals  in  noise.  Fntnre  work  will  address 


means  to  rednce  the  computational  burden  of  the  MLE.  Also,  the 
extension  of  this  work  to  the  case  of  unknown  ratio  of  background  and 
interfering  noise  powers  will  be  studied. 
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APPENDIX  A 


Computation  of  Gradient  and  Hessian  in  Newton-Ranhson  iterations 


From  section  IV,  the  approximate  (conditional)  log  likelihood 
function  can  be  written  as 
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From  (13) ,  n  =  lax 
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it  follows  that 


a  a  a 
3o  ox  -  oB 


dint 


V* B  38 


1  N  -E_(n  )  +  E_(n  ) 

_  ^  B  n _ I  n 

or  -o_  n=p+l  F_(n  >+F  (n  ) 
B  I  B  n  I  n 


(A-4) 


'  W,  W 

3  o* 


F_(n  )  F  (n  )  F  (n  )  F  (n  ) 

B  n  I  n  o  n  +  l  n 

a  a  a  a  a 

3  Inf  N  Og  (fj  N  3  ofi  <rJ 

da  da .  n^+l*"”**"' j  F_(n  )+  F_(n  )  n=n+l  "n  n_k  3a.  F_(u  )+  F_(n  ) 

k  j  Bnln  kl  B  n  l  n 

Let  F_(n  )/o*  +  F  (n  )/o*  =  T(n  ) 

B  n  B  Ini  n 

F_(n  )  +  F  (n  )  =  B(n  ) 

Bn  In  n 


3T(n  ) 
n 


W 


n  3n 
n  _ n 

*B38k 


FT(n  ) 
I  n 


n  3n 
n  _ n 

a*k 


-  n  x  .  t  F  (n  )/o_  +  F  (n  )/oT] 
n  n-k  B  n  B  Ini 


F„(n  )  F_(n  )1* 


da  3a.  <y-<y_  n=t>+l  3e  |  FB(n  )+  F_(n  ) 

J  Id  ion  in 


*  *  ^  tXn-jnn 

ffg-erj  n=p+l 


W  W 


r  ( 


F-,(n  )+F_(n  )  1 
Bn  I  n  J 


[  W*  W]* 


w,  w 

i  a 


i1  ivv-w] 


[  W  W]‘ 


It  follows  that 
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Equations  (A-3)  to  (A-7)  coiaoletely  determine  the  gradient 


vector  and  the  Hessian  matrix.  nQ,  Ej(nn),  ^^n^'  Fj(nn)  and  FjCn^ 
need  to  be  comonted  only  once  for  every  n  in  each  iteration  and  can  be 
nsed  to  find  all  the  first  and  second  derivatives. 
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Use  of  transformation  on  o* 


Experiments  show  a  rather  weak  deoendence  of  the  log  likelihood 
function  on  er*  (or  equivalently  s).  This  problem  can  cause  the  (o+l)~ 
dimensional  optimization  not  to  converge.  A  possible  solution  is  to  use 
some  transformation  q  3  g(s)  such  that  q  increases  slowly  with  e.  If  a 
proper  transforsmtion  is  chosen  Inf  will,  hopefully,  show  a  distinct 
peak  when  optimized  over  q.  The  intuitive  idea  behind  this  technique  is 
to  increase  the  sensitivity  of  the  function  to  the  variable  over  which 
it  is  to  be  optimized.  Since  g  is  an  increasing  function  of  s,  the 
optimum  value  of  q  will  correspond  to  a  unique  value  of  e.  An  example 
of  such  a  transformation  is 
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Table  I  :  Parameters  of  the  AR  orocesses  used  for  simulation: 


0. 98exn [ j2n( 0. 11 ) ] 
0.98exp[j2n(0.14)  ] 
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Table  II  :  Performance  of  the  Forward-Backward  Estimators 
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Table  III  :  Performance  of  the  Maximum  Likelihood  Estimators 
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Figure  3.  Mixed-Gauss ian  time  series  at  the  input  and  output  of  an  AR  filter 
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